Accessing Data on SGI Altix: An Experience with Reality
نویسندگان
چکیده
The SGI Altix system architecture allows to support very large ccNUMA shared memory systems. Nevertheless, the system layout sets boundaries to the sustained memory performance which can only be avoided by selecting the “right” data access strategies. The paper presents the results of cache and memory performance studies on SGI Altix 350. It demonstrates limitations and benefits of the system and the Intel Itanium 2 processor underneath.
منابع مشابه
Evaluating Performance of the SGI Altix 4700 via Scientific Benchmark and Micro-Benchmarks
I evaluated the performance of the SGI Altix 4700 by using several well-known benchmarks. In performing these experiments we hope to gain a better understanding of the capabilities and limitations of the system, and thus be able improve upon the design in future generations or develop tools that enhance the performance of the system.
متن کاملOptimizing OpenMP Parallelized DGEMM Calls on SGI Altix 3700
Using functions of parallelized mathematical libraries is a common way to accelerate numerical applications. Computer architectures with shared memory characteristics support different approaches for the implementation of such libraries, usually OpenMP or MPI. This paper’s content is based on the performance comparison of DGEMM calls (floating point matrix multiplication, double precision) with...
متن کاملHigh Performance FFT on SGI Altix 3700
We have developed a high-performance FFT on SGI Altix 3700, improving the efficiency of the floating-point operations required to compute FFT by using a kind of loop fusion technique. As a result, we achieved a performance of 4.94 Gflops at 1-D FFT of length 4096 with an Itanium 2 1.3 GHz (95% of peak), and a performance of 28 Gflops at 2-D FFT of 4096 with 32 processors. Our FFT kernel outperf...
متن کاملHigh performance computing using MPI and OpenMP on multi-core parallel systems
The rapidly increasing number of cores in modern microprocessors is pushing the current high performance computing (HPC) systems into the petascale and exascale era. The hybrid nature of these systems—distributed memory across nodes and shared memory with non-uniform memory access within each node—poses a challenge to application developers. In this paper, we study a hybrid approach to programm...
متن کاملRevealing the Performance of MPI RMA Implementations
The MPI remote-memory access (RMA) operations provide a different programming model from the regular MPI-1 point-to-point operations. This model is particularly appropriate for cases where there are multiple communication events for each synchronization and where the target memory locations are known by the source processes. In this paper, we describe a benchmark designed to illustrate the perf...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006